专利摘要:
A distributed technical infrastructure for an agent-based data traceability platform (DTP) system including at least one layered and compute-based metadata base server, a monitoring server of data quality, a management server for a data provisioning service (DPS), and an automatic polling machine, characterized in that the database is used for storing data and keeping secure evidence of data exchange, data ownership, and data ownership in data objects (DOs) to allow a user to access data sets (DO1, ..., DOn) provided by other users, and to propose a data set accessible by another user, and in which the data traceability platform adapts the data channel technology. e blocks for the secure tracking of the property of a set of data (DO) to be stored in the database, and sets up a consensus process by federation, during which a voting module decides the validity of a block to store the block in the database.
公开号:FR3074322A1
申请号:FR1761423
申请日:2017-11-30
公开日:2019-05-31
发明作者:Tobias Rene Mayer;Yacine KESSACI;Frederic Oble
申请人:Worldline SA;
IPC主号:
专利说明:

FIELD OF THE INVENTION
The present invention relates to the field of data management, and more specifically a platform for securing and monitoring data exchange.
BACKGROUND OF THE INVENTION
The intensive proliferation of digital technology is creating a huge amount of data. Each piece of data has a certain inherent value, which makes data a commodity that is traded.
Personalized ads based on user profiles is an example. For example, on a commercial data analytics platform, "providers" can offer machine learning (ML) models, learning data sets, and "machine-ready" machine learning models. use ”, and“ consumers ”purchase this data for personal use. Machine learning models and learning data have some inherent value. Trained machine learning models create value by combining existing data from other providers, that is, applying learning data sets for a significant period of time.
For example, data providers should be rewarded if their data has been used to create a new set of data that is sold on the platform, such as trained machine learning models. However, the simple re-use of data poses a threat to intellectual property. A dishonest user can, for example, slightly modify the dataset (bypassing conventional integrity checks) and offer the "new" dataset by themselves.
GENERAL DESCRIPTION OF THE INVENTION
The present invention aims to overcome certain drawbacks of the prior art by providing a means of securing and managing the exchange of data.
This goal is achieved by a distributed technical infrastructure intended for an agent-based data traceability platform (DTP) system comprising at least one metadata database server with layered architecture and computing resources, a server for data quality monitoring, a management server for a data provisioning service (DPS), and an automatic query machine, characterized in that the database is used to store data and keep evidence secure data exchange, data ownership and transfer of ownership of data in data objects (DO) to allow a user to access data sets (DOi „., DO n ) by other users, and to propose a data set accessible by another user, and in which the data traceability platform adapts the cha technology block of blocks for the secure monitoring of the ownership of a data set (DO) to be stored in the database, and sets up a consensus process by federation (FC), during which a voting module decides to the validity of a block in order to store the block in the database.
According to another particularity, the voting module of each agent of the data traceability platform of a "DTP federation", during the consensus by federation (FC), offers an individual vote on the validity, and the cumulative total of votes determines the validity of the block, at least the size and the participants in the DTP federation, and the validity threshold being configurable.
In another feature, blockchain technology is adapted to understand a forward-looking model allowing a service provider to provide computational information, by storing at least the assertions of an access and control (AC) approach. in the context of a transaction, thus containing purely forward-looking information on the calculation which specifies the actual logic executed.
In another feature, blockchain technology is adapted to understand a retrospective model by taking advantage of the functionality of storing and logging data in order to maintain processing information.
According to another particular feature, the blockchain technology is adapted to understand a mixed pro / retrospective model with the functionality of a distributed process management system (WMS), the pro and retrospective information being combined in order to allow the follow-up of the calculation and assess the reliability of the calculation result.
According to another particularity, the distributed technical infrastructure comprises at least one public key infrastructure (PKI) which makes it possible to use asymmetric cryptographic tools to secure the transactions of a set of data or of data objects (DO) and track ownership of a dataset.
According to another particular feature, a unique public and private key is generated for the creation of an agent account, in order to allow said agent to use at least one asymmetric cryptographic tool of the public key infrastructure.
According to another particular feature, said unique public and private key also allows said agent to act as an active DTP entity, which is active in a machine which is running a DTP instance and which is responsible for the actions of the instance using the digital signatures of all communications that use said private key.
According to another particular feature, the distributed technical infrastructure comprises at least one access and control system (AC) which maintains access to the services available and / or which controls the creation of fraudulent services within the platform, while the access rights are specified by each service provider which creates a proof of access right for the user concerned using the blockchain data management functionalities.
According to another particular feature, the AC system allows inter- and intra-service traceability if appropriate assertions are provided by a service provider.
According to another particular feature, the service provider specifies, for a consumable digital service, using an interactive interface based on a Web browser, origin metadata which allow inter-service and intra-service traceability, and at minus details about the calculation performed and the expected output data objects (DO).
In another feature, the database is distributed with linear correction performance, optimized for writing operations, and designed to handle high workloads and functionality in parallel.
According to another particular feature, the database is Apache Cassandra.
According to another particular feature, the database comprises at least one storage layer comprising at least one storage of transaction groups, a block queue storage, a block chain storage, an invalid block storage, and a data lake.
According to another particular feature, said storage layer comprises at least one set of modules: a transaction group module intended to process the transactions which enter the storage of transaction groups and are going to be stored in the blocks of the queue. blocks, a block queue module intended to process blocks which have not yet been validated and which are subject to consensus for the validation of distributed blocks, a block chain module intended to process validated blocks, a invalid block deposition module for processing blocks that have been validated as invalid during block validation by consensus by a block voting module, a storage management module for storing and processing additional entities that make part of the blocks and transactions, in particular the votes which are required for block voting, the block statutes used to persist the result t a block validity vote, and data lake elements (DLE) extracted to improve performance due to their variable data size.
According to another particularity, the execution of the storage management module allows the storage of transaction groups at least to receive the incoming raw transactions, to verify the transactions using an in-depth validation algorithm, to allocate the incoming transactions to a DTP agent through a selection strategy, and adding said verified transactions to the storage of transaction groups.
According to another particularity, the deep validation (DV) makes it possible to subsequently validate arbitrary data by all the architectural layers DTP, thereby validating the data possibly nested in depth, while the validation logic is provided by means of " validators ”who performed validation in a semantically encapsulated form optimized for simultaneous execution.
In another feature, the transaction group module identifies the transactions checked with a large margin using the assigned DTP agent who is responsible for the execution of the transaction group module, creates blocks using the authorization of the transactions, transmits the blocks to be stored in the block queue, and deletes the transactions used from the storage of transaction groups.
According to another particular feature, the execution of the voting module makes it possible at least to check whether the "ID" of an agent is in a list of voters stored in the storage layer of the database, to execute a deep validation process to determine the validity of a block, to add a vote to the block corresponding to the result of the validation, and to trigger or not a timer for each missing vote.
According to another particular feature, the storage layer comprises at least:
• a list of agents and services that exist on the platform. Said agents and services have been created and stored in a block which has been validated. Said list takes into account all the states created, deactivated and deleted.
• A block chain block cache memory (BBC), which is a cache memory of a certain number of blocks, for which a decision has been made (stored in the block chain or in the invalid block repository), which are kept in memory for performance reasons. Said cache memory is materialized as a queue limited in size (configuration parameter) which uses underlying HashMaps for an efficient search for at least all the block IDs (block identities), transaction IDs;
• a consensus block cache memory (CBC), which is analogous to the block chain cache memory and which is used for blocks for which no decision has been made;
• a voting list, kept by each agent, which includes the blocks for which a vote is required by the corresponding agent, that is to say when the agent ID has been added to the list of " voting agents ”of the block (ID of the agents who must provide a vote), but that no vote has yet been provided;
• a transaction cache memory (PTC), which stores transactions from the transaction group;
• a list of transaction authorizations (TLC). Said list contains at least the following information in order to decide whether a transaction can be added to a block intended to be stored:
- a previous transaction (if any) is stored in a validated block
- an agent issuing a transaction is not blacklisted
- all the data (DLE, agent, etc.) are available in the system in order to verify them.
According to another particular feature, the traceability layer comprises at least one “agents” module intended for storing the traceability agent data model, a data object (DO) module intended for storing the data model of tracing objects. traceability data (DO), a “service” module intended to store the data model of the traceability service, and a “transactions” module intended to store the transactions of the traceability layer.
According to another particularity, the traceability entities are also identified by a "static ID" which does not change during an update and includes at least:
• Agent owner (AgentOwner): "agent_owner_id" (unique ID) • Agent: "agent_id" (public key) • Service: "service_id" (unique ID) + "service_version"
According to another particularity, in the storage layer, at least one of the following requests is taken into account to guarantee the efficient processing of information linked to a block:
• Obtaining blocks by ID;
• Obtaining the block creation time;
• Obtain storage transactions by block ID;
• Obtaining status information by block ID;
• Obtaining votes by block ID;
• Obtain DLE by storage transaction ID
According to another particular feature, at least one of the following requests is taken into account for the traceability of the data in the traceability layer, which are carried out as database requests with the corresponding information:
• Obtaining the transaction to be traced by data object ID;
• Obtaining the transaction to be traced by service ID;
• Obtaining agents by transaction ID to trace;
• Obtaining services by transaction ID to be traced;
• Obtaining agents by agent owner ID;
• Obtaining services by service ID.
According to another particularity, the voting module having high performance and a configurability functionality, allows:
• to adjust the vote according to a reduced federation size, with a reduced voting system time and reduced storage latencies, • to instantly store the vote without the architectural design requiring a minimum storage period.
In another feature, the database storage layer technology is scaled by design to achieve high total storage throughput.
According to another particular feature, the parametrability functionality of the distributed technical infrastructure is provided by a program and configuration files specific to the layers, each layer of said infrastructure being capable of providing individual configurations.
In another feature, the data provisioning service (DPS) keeps proof of activity transparent to the user.
In another feature, the data provisioning service (DPS) maintains the data provisioning activity, which requires the indication of related metadata, such as references to the data sets used by others.
In another feature, the data provisioning service (DPS) keeps the complete history of a set of data by analyzing the evidence and following the references within the metadata in order to allow the monitoring of the complete history of 'a set of data, and to identify all the users who can contribute to its creation.
In another feature, the Data Provisioning Service (DPS) extends pure ownership functionality with operations to maintain the datasets on the platform.
In another feature, the platform receives a request from a complex data analysis service using machine learning (ML) models and training data sets available on the platform in order to '' offer a machine learning analysis service.
According to another feature, the system includes:
• a block chain; and • a calculation resource intended to execute a loop so that the execution of the loop is influenced by the state of the block chain, the loop being implemented using a script, in order to preserve a count of one or more votes for block validation linked to consensus, or for block validity decisions, and a corresponding block status generated for or associated with the block extracted from the block queue; and a set of invalidity conditions defined by the federation to determine the validity of the blocks, and validated by each agent who takes part in a vote are evaluated and at least one action is taken based on the result of the evaluation; said action comprising at least:
- have at least one transaction written to the blockchain volume of the database; and or
- have an off-block chain action executed.
According to another particular feature, for each iteration of the loop, a cryptographic hash of the script is generated, and the information relating to at least one iteration of the loop is stored in a transaction on the block chain; the information being stored as metadata in the transaction.
According to another particularity, the conditions relate to the data received, and the changes of property detected or generated by the computing resource; or the state of the blockchain.
According to another particular feature, the database offers a means of identifying the processes related to the storage of the transaction group, to the block queue, to the block chain and to the deposit of invalid blocks, to the management of blocks, agents, data objects, services and transaction tracking.
In another feature, the database storage and blockchain consensus processes each constitute a layer of architecture that can be deleted, replaced or extended with other features.
According to another particular feature, the data traceability platform comprises at least one estimation module, said module comprising at least one set of models and programs intended to estimate the contribution of different sets of data for the creation of a new set and thus to estimate a fair remuneration scale for the reuse of the data set.
According to another particular feature, the set of models and programs comprises at least one Shapley value estimation model, said Shapley value estimating the value of a data set and / or the relative contribution of data sets input to the value of the resulting dataset.
According to another particular feature, the estimation module is used for the remuneration of each user who participated in the creation of the new data set according to the amount of the participation.
BRIEF DESCRIPTION OF THE DRAWINGS
Other characteristics and advantages of the present invention will appear more clearly on reading the description below, made with reference to the accompanying drawings, in which:
- Figure 1 is a schematic representation of the elements of the distributed database (2).
FIGS. 2a, 2b, 2c and 2d are schematic representations, respectively, of a data provider (3) which creates a service using the data traceability platform (1) (DTP), a data provider (3) that provides a service to a data consumer using the data tracking platform, and the process of tracking the use of the data consumed using the data traceability platform and the use of the platform to estimate the relative contribution of the input data sets to the value of the resulting data set, according to one embodiment;
- Figure 3 is a representation of tables containing the attributes relating to a data object, according to one embodiment;
- Figure 4 is a representation of tables containing the attributes relating to a block, according to one embodiment;
- Figure 5 is a representation of tables containing the attributes relating to the transaction group of the storage layer, according to one embodiment;
- Figure 6 is a representation of tables containing the attributes relating to the queue of blocks of the storage layer, according to one embodiment;
- Figures 7a and 7b are representations of tables containing attributes relating respectively to storage management and block voting, according to one embodiment
FIG. 8 is a representation of tables containing the attributes relating to the transactions in the storage layer, according to one embodiment;
- Figure 9 is a representation of tables containing the attributes relating to the "owner of an agent" of the traceability layer, according to one embodiment;
- Figure 10 is a representation of tables containing the attributes relating to an “agent” of the traceability layer, according to one embodiment;
- Figure 11 is a representation of tables containing the attributes relating to the “services” of the traceability layer, according to one embodiment;
FIG. 12 is a representation of tables containing the attributes relating to the transactions of the traceability layer, according to one embodiment.
DETAILED DESCRIPTION
The present invention relates to a distributed technical infrastructure for tracking data.
In certain embodiments, the distributed technical infrastructure intended for a platform system (1) for data traceability based on (DTP) comprises at least one metadata database server (2) with layered architecture (FIG. 1) and with computing resources, a data quality monitoring server, a management server for a data provisioning service (DPS), and an automatic interrogation machine, and is characterized in that the base data (2) is used to store data and maintain secure evidence of data exchange, data ownership and transfer of ownership of data in data objects (DO) to enable a user ( 4) to access data sets (DOi .. DOn) supplied by other users, and to propose a data set accessible by another user (4), and in which the tracking platform ity of data (1) adapts blockchain technology (22) for the secure monitoring of the ownership of a data set (DO) to be stored in the database (2), and implements a process of consensus by federation (FC), during which a voting module decides on the validity of a block in order to store the block in the database (2).
The platform (1) (DTP) can comprise at least one set of layers and at least:
• a central layer which holds the platform layers (1) and allows all the layers to access the controllers of the other layers.
• a network layer which keeps a list of peers who are currently online. Thus, a higher layer can check if a peer and its services are currently online. The network layer does not know the upper layers. However, a network parameter "PeerProfile" is prepared with an agent and a signature field. This activates the link between a peer and an agent of the platform, and cryptographic integrity.
Before going further in the description of the present invention, some of the terms used or which can be used must be defined.
The term “agent” designates the smallest autonomous entity of the platform (1) which offers and / or uses the services of the platform (1). The term "Agent" is further characterized by the following:
• "Agent application": technically, the agent is a kind of software application which interacts with the platform (1) and includes the related communication logic and the collaborative protocol;
• “Agent operator”: an agent (an application) can be controlled by a person, software logic (such as a database daemon (2), a data analysis controller, a software agent) or a artificial intelligence that interacts with the agent application. This logic can be integrated into on-board hardware (such as smart sensors).
• "Agent machine": the agent machine is the computer on which the agent application is run. For simplicity, we assume an agent application running on a dedicated machine. An agent machine can be linked to other machines to provide the service more efficiently (such as a cluster of big data analytics servers). These machines serve only as support for the execution of the agent service, and are not directly connected to the platform.
• “Agent legal representative”: each agent has a single person who acts as legal representative, and who is responsible for 1) the agent's operations on the platform (1) and 2) cash transfers , like paying for a service. For the sake of simplicity, we assume that these two responsibilities are assumed by one person, while several agent machines can have the same legal agent representative.
• "Platform Agent (1)": this is a special agent who works on behalf of the supplier of the platform (1) (3, Figures 2a, 2b) and who focuses on the availability and accuracy of the service available. Thus, it performs proof checks (provenance metadata, for example) and supports certain operations, such as dissemination and caching of data.
The term "service" refers to a set of "operations" offered by an agent, the "service provider (3, Figures 2a, 2b)" or the data provider, to another agent, the "consumer of the service". "Service consumption" means that the consumer of a service (or the user of a service (4, Figures 2b, 2c), or a consumer of data) uses a specific operation offered by the service provider (3). In what follows, if not explicitly mentioned, we use the term "service" interchangeably with the term "operation (service)".
Some of the terms linked to a service offered on the platform (1) are, for example and without limitation, the following:
• A "service call", which designates the request or request, by an agent, for the consumption of a service offered by a supplier (3).
• A "service activity", which refers to the activity carried out by a service provider (3) to perform a specific service (and for which a consumer can pay fees). The activity includes:
- either the provision of a data object prepared in advance;
- either the provision of a data object created on demand by a certain processing (taking into account input data objects or external resources); is
- the processing of data without delivering a data object as a service (such as the delivery of a reputation, the modification of access control rights).
• The term "service metadata" describes the service and the corresponding service activity. More particularly, it contains (if provided by the agent) detailed provenance metadata to allow the traceability of data objects as a service (such as the processing of scripts in source code for the creation of the data object).
• The term "service entry" designates the data objects (own or from other services) used as input for the execution of a service operation by the service provider (3). In terms of programming languages, this represents the input parameters of a data processing process.
• The term "service exit" designates the result of the service activity which is transmitted to the consumer of the service. It includes at least one status code and usually one or more data objects as well.
• “type of service” defines the way in which the services are performed. There are two ways:
- A "service pull", which is executed reactively. The consumer of a service makes a service call to a service provider (3) in order to consume the service offered. The provider (3) performs the service activity and returns the service to the consumer.
- A "push service", which is executed proactively. The consumer of the service subscribes to a specific service by making a service call. The service provider (3) performs the service and provides the service on the basis of an event or periodically.
The term "data object" (DO) refers to an entity that encapsulates data for exchange between agents in the context of service ingress or egress. The "origin of the data object" (see data object metadata) specifies how the data object was created. It generally relates to a service transaction and, thus, to a specific service / agent, where the actual creation must be described using provenance metadata. However, agents can also create a data object for the service entry (possibly using another data object and external resources). In this case, the agent should ideally provide the corresponding provenance metadata in the same way as the service provider (3).
A data object can be one of the following "types of data objects":
• A “set of values” • A “data set” • A “data flow” • “Generic data”
"Data object metadata" describes the data object and its context. They contain, among other things, the type of data object, the date and time of creation, the service responsible for creation, the service provider (3), information on integrity and security (sums of control, digital signatures, etc.), and identification of the owner (see Figure 3).
The provenance indicates, from a topdown point of view or approach, the possibility (or ability) of deriving the history of the data during the calculation steps during processing, starting with the initial sources. Conversely, we consider traceability as having bottom-up or bottom-up characteristics, indicating the possibility (or ability) to follow the relationships between data and data changes in order to determine the origin processes. Therefore, we consider traceability by indicating the technical aspects of data association, while provenance addresses the overall concept of determining provenance information (such as the processes back to the original creation of the data).
A provenance system is, in general, characterized by at least three main characteristics: provenance orientation, pattern and capture mechanisms.
The orientation of the provenance can be data oriented or calculation oriented. In the first case, the provenance system captures information on the flow and transformation of the data. In the second case, which corresponds to the calculation orientation, the information captured is the details of the data transformation (data processing).
In the provenance system, the provenance model used to capture information is generally of two types: retrospective and prospective. In a retrospective model (RPM), the annotations on the flow and the transformation of the data are recorded. The retrospective model can keep the annotations on the data and / or their processing. Conversely, the prospective model (PPM) offers a "recipe" (a set of steps, a method, ...) which describes how to process the data. Thus, a prospective model is generally focused on processing in order to be able to easily repeat the same calculation. The data traceability platform implements a retrospective model by automatically preserving the proofs of interactions (the proofs of consumption of a service are for example kept by means of the "traceability transaction", which contains detailed information on the supplier, the consumer, the service, etc.), which are cryptographically secure to guarantee data protection.
The capture mechanisms, essentially used in a provenance system, are mechanisms that are based on processes and / or an operating system.
As far as process-based mechanisms are concerned, these are closely linked to process management systems (WMS). Thus, the provenance information indicates the links between the inputs and outputs of the calculations, which determine a process.
Process-based mechanisms store computational information from within an application, that is, they require their own process to document computational tasks.
Operating system-based mechanisms (also known as "environmental capture") require no changes to the application itself. They use the features or characteristics of an operating system (or an environment through wrappers) to capture data. Thus, the processes of running an application are considered a black box, while input and output can be captured. A graphical representation is a common visualization and is used by mechanisms that rely on a process management system and an operating system. However, they present fundamentally different approaches.
Many provenance systems include at least one of the main characteristics described above, such as, without limitation, Sumatra, Taverna or Vistrails. The most common systems are process-based, and target reproducible calculations and thus clearly traceable data. Typically, they specify the calculation in advance, while retrospectively recording additional information. However, none of the systems is suitable for being directly applied for data traceability. Process management systems are overly focused on central management and the processes themselves, that is, they specify the processes (or service orchestration) in advance. Systems whose data capture is based on an operating system are irrelevant, since they break the surveillance hypothesis.
Proven provenance systems are promising candidates, however. In these systems, an agent specifies the processing of data through assertions, and indicates with metadata the relationships with other input data sets. It also corresponds to an organized and decentralized process management system, focused on processing while also capturing processes. Adequate processing can then be verified through checks, with a particular set of data.
In the present invention, a block chain system (22) is adapted to provide several advantages in terms of provenance and traceability.
The blockchain (22) is not a traditional provenance system, but rather a type of distributed database (2). The blockchain (22) is capable of storing information securely through cryptographic evidence, without the need for a central (trusted) instance. The blockchain (22) (see Figure 1) stores information in the form of blocks, which are concatenated into a chain and stored in a distributed manner. Each block has a reference to the previous block, which allows you to cross the entire chain from the last block. Since new blocks can be added at the same time, "ramifications" can appear, and generate several last blocks. Only one chain can be valid to guarantee a last single block, and is determined by a consensus protocol. Each block includes several transactions, which are secured with asymmetric cryptography. Each new transaction is committed by the owner of the current transaction, which creates a chain of committed transactions. As proof of validity, the owner of a transaction in progress digitally signs a hash value for the new transaction. The hash value uses the current transaction and the public key of the owner of the new transaction as input. It is important to note that the blockchain (22) has no state. The transactions stored in a block are the only existing state.
The term "transaction" generally refers to all messages exchanged to ensure the consumption of a service. A transaction, as for databases (2), is either executed in full (successful consumption of the service, with all exchanges of related messages), or in failure (no consumption of service).
The term "hash" refers to the value returned by a hash function. A hash function is any function that can be used to map data of arbitrary size to data of fixed size.
An example of use is a data structure called a "hash table", widely used in computer software to quickly find data. Hash functions speed up the search in a table or a database (2) by detecting duplicate records in a large file. Example: search for similar sections in DNA sequences. They are also useful for cryptography. A cryptographic hash function makes it easy to verify that certain input data is mapped to a given hash value, but if the input data is unknown, it is deliberately difficult to reconstruct it (or otherwise) by knowing the hash value stored. This is used to guarantee the integrity of the data transmitted.
Creating a block capable of storing transactions in the block chain (22) requires solving a certain cryptographic puzzle. Generally, as in Bitcoin for example, this involves finding certain nonce values (random or pseudo-random number intended to be used only once) so that the hash value for the transactions to be stored begins with a group of zeros. This is also known as "mining" (Bitcoin); the miner is rewarded with bitcoins for his computational efforts. A new block, if verified by other explorers, is then added to the block chain (22), which serves as storage confirmation for associated transactions.
By minor, we mean a person who has at least a hardware and software architecture to implement the mining process above.
In some embodiments, a block is defined by a table (see FIG. 4) which contains the following attributes (ID, timestamp, status, dm_version, payload, creator, voters, comment, signature, ID of the previous block)
It should be understood that the consensus by federation (FC) of the present invention is very different from the other approaches to block validity, and in particular from the “proof of operation” approach which uses the calculation of a valid block through a “block mining” (by solving a puzzle), during which the validity of a block is checked individually by each user of the block chain. Instead, the federated consensus of the present invention operates as a pure distributed system, in which the voting module of each data traceability platform agent (active entity of a traceability platform instance ) of a "platform federation" (a subset of all available platform agents) offers an individual vote on validity, and the ultimately accumulated total of votes determines the validity of the block. The significant differences also lie in the possibility of configuration, and in particular the actual size of the DTP federation and of the participants, and the validity threshold.
The transactions also include transaction scripts which must be executed without problems in order to be accepted in a block. Said transaction scripts generally verify the hash values and the signatures of the transactions (represented by a parameter indicated transaction “pay-to-pub-key-hash”). However, the scripting language used offers almost 200 commands, called "opcodes", which include support for cryptographic operations. This allows complex logic for storing transactions.
In a blockchain (22), there is no inherent notion of identities or individual accounts that "own" a transaction. In a blockchain, property simply means knowing a private key that is capable of signing that retrieves the transaction.
In addition to the aforementioned technological features integrated into a blockchain (22), such as transactions and scripts which relate to the format and scripts of a transaction, i.e. the manner in which a transaction is technically specified and its accuracy is validated, a consensus protocol which specifies the chain which will be considered valid when several blocks are valid in terms of cryptographic puzzle, a block mining feature which specifies an adequate cryptographic puzzle in order to find a valid block identifier ( ID) and which can also manage the rewards appropriate for the explorers for their computational efforts, a blockchain (22) can also include at least one peer-to-peer (P2P) communication module which manages at least the distribution of blocks on the platform (1) and / or transactions.
The above features may allow a blockchain (22) to be used as a distributed evidence management system for data traceability. In fact, the blockchain (22) intrinsically performs proof management on a P2P system without the need for a central trusted instance. In addition, traceability is naturally ensured through chain architecture. In addition, all transactions and the data they contain are automatically verifiable. However, aspects such as the consensus protocol algorithm and mining rewards must be based on informed decisions, as they affect the overall stability and performance of the distributed platform.
As far as the provenance system is concerned, as a rule, the block chain (22) can be characterized by the following:
• data-oriented capture information: the block chain (22) aims to persist the conventional data within the distributed block chain (22). However, the transaction format is generally modified in order to execute additional functionalities (such as smart contracts, or “smart contracts”) • retrospective provenance model: a block chain (22) generally has a retrospective provenance model given that the output (result) of certain processing operations or other available data is stored.
• process-based capture mechanism: the blockchain (22) is process-based, since the actual application must persist the data within the chain (22).
By "persisting" data is meant to "store" data in a persistent manner. In other words, save them so that they do not disappear when, for example, the process or the processing program ends.
In certain embodiments, the distributed technical infrastructure comprises at least one public key infrastructure (PKI) which allows the use of asymmetric cryptographic tools to secure the transactions of a set of data or data objects (DO) and track ownership of a dataset.
In some embodiments, an agent must create a platform account in order to participate in the platform (1) and designate a legal representative, responsible for the actions of the agent, whose identity (ID) is verified. . During the creation of the account, a unique public and private key is generated for the agent in order to be able to use at least one asymmetric cryptographic tool of the public key infrastructure. In some embodiments, said unique public and private key may also allow said agent to act as an active DTP entity, which is active in a machine which is running a DTP instance and which is responsible for the actions of the instance at helps digital signatures of all communications that use said private key.
In some embodiments, the distributed technical infrastructure includes at least one access and control system (AC) which maintains access to available services and / or which controls the creation of fraudulent services within the platform. , while the access rights are specified by each service provider (3) which creates a proof of access right for the user concerned using the blockchain data management functionalities (a document of proof is created with the block chain transaction "CREATE", whose data model indicates the current owner by means of an attribute or field "c_owner", and is cryptographically secured by the digital signature of the service provider). Using the access and control system, an agent must specify in advance, by means of statements, the services offered and the service activity carried out. The assertions can then be checked after the consumption of the service and, thus, can detect any fraud specified above. Indeed, assertions specify the service activity and verify the corresponding service rendered against the assertions. If there is no correspondence, access is not granted, and the service and the result are stored in a memory of the technical infrastructure in order to carry out additional checks. "No access granted" means that the service itself is visible, but that consumption of this service will be refused.
The term "data model" refers to a model that documents and organizes data, how it is stored and accessed, and the relationships between different types of data.
In some embodiments, the access and control system allows inter- and intra-service traceability, since the service activity can be followed in detail if appropriate assertions are provided by a service provider. For each consumable digital service (and the output data objects offered through a service), the service provider (3) specifies, using an interactive interface which is based on a Web browser (and which creates data models that can be processed by the data traceability platform), provenance metadata that allow inter-service and intra-service traceability. The service providers (3), within the platform, provide at least the details of the calculations performed and of the planned output data objects.
The term "inter-service traceability" refers to monitoring the orchestration of the service (if any) and the use of external resources. This is done using a specification of the services used through the service provider during service creation (such as an interactive service through a web browser user interface that creates usable data models by the data traceability platform), stored in the correlated data models of the platform (for example, a service can indicate "other services used"). Since the data traceability platform is a private system, any user must accept the general conditions, which include the accuracy of the information indicated, thus representing protection by law, the breach of which may be detected by the data analysis processes of the platform and ultimately be the subject of legal proceedings, also using the evidence of traceability stored on the data traceability platform.
"Intra-service traceability" means tracking how the service itself is performed, that is, calculations related to a service and the creation of output data objects (including any models / algorithms, input values, etc.).
In certain embodiments, the block chain (22) is adapted to include a forward-looking model allowing a service provider to provide calculation-oriented information, by storing at least the assertions of an access and control approach ( AC) within the framework of a transaction, thus containing purely prospective information on the calculation which specifies the real logic executed (such as immediate calculation results which make it possible to follow the calculations or to statistically assess the confidence level of the results) and which may have different levels of abstraction (for example, source code, workflow diagrams, pseudo-code).
In some embodiments, the block chain system (22) is adapted to understand a retrospective model by taking advantage of the functionality of storing and recording data (notary functionality), i.e. evidence ( cryptographic) according to which the data was provided or stored with the integrated data management operations of the blockchain of the data traceability platform (creation, update, transfer, deletion, activation, deactivation) in order to maintain processing information. This includes information about the logic actually executed before the execution itself (e.g. use of other services during calculation, calculation results).
In some embodiments, the block chain system (22) is adapted to include a pro / retrospective model with the functionality of a distributed process management system. This is made possible by combining pro and retrospective information, such as knowledge of the service execution logic and the results of intermediate calculations, in order to allow the monitoring of calculations and to assess the reliability of the results.
In some embodiments, the database (2) (DB) is designed to handle high workloads and functionality in parallel. It is distributed with linear correction performance optimized for write operations, which represent the main operations of the target platform. Said database (2) can also support heterogeneous data, with which the unaffected data fields do not consume / reserve data. Preferably, the underlying database (2) is Apache Cassandra. Said Apache Cassandra database (2) can provide at least one fault-tolerant consensus algorithm, such as, for example, and without limitation, PAXOS, which guarantees the accuracy of distributed data replication and, therefore, excellent consistency of the database (2).
In certain embodiments, the database (2) comprises at least one storage layer and a traceability layer.
The storage layer comprises at least one set of modules: a transaction pool module (20) comprising the following attributes (see FIG. 5) (ID, dm_version, payload, timestamp, operation, previous_tid, c_owner, f_owner, payload_properties, comment, data_other, data_dle, signature, responsible_agent) to process incoming transactions that will be stored in blocks in the block queue (21), a block queue module (21) including the following attributes (see figure 6) (id, dm_version, payload, timestamp, transactions, creator, voters, comment, status, signature) to process the blocks which are not yet validated and which are the subject of consensus by federation for the distributed validation of blocks, a block chain module (id, dm_version, payload, time stamp, creator, voters, comment, status, signature, ID of the previous block) (see Figure 4) to process validated blocks, an invalid block deposition module (23) (comprising the same attributes as the block chain module) for treating the blocks which have been validated as being invalid during the validation of the blocks by consensus by federation by a module of block voting including the following attributes (see figure 7b) (ID, dm_version, payload, voter_pubkey (public key), timestamp (timestamp), current_blockid, Id of the previous block, is_valid, vote_invalid_reason, signature) storage comprising at least the following attributes (see figure 7a) (ID, timestamp, status, previous_blockid, node_pubkey, signature, type, dm_version, transaction_id, indices, layer, data_dle_data_meta, data_other, data_dle_count) to store and process the additional entities that make part of the blocks and transactions, in particular the votes required for block voting, of the block statutes used to keep the result of a vote of val Identity of data lake (24) (DLE) blocks and elements extracted to improve performance due to their variable data size, which can be large.
In some embodiments, the transactions contained in the storage layer are defined in tables illustrated in FIG. 8, which contain the following attributes (ID, dm_version, payload, t_index (transaction index, i.e. say the storage index in the corresponding block), payload, timestamp, operation, previous_tid (transaction ID), c_owner (current owner, i.e. the current owner of the public key), f_owner (respect the condition defined by the "current owner" field of the previous transaction, ie a signature with the private key which corresponds to the public key of the current owner), "payload_properties", comment, data_other, signature, blockid, dle_count (data lake element).
The term "data lake" (24) refers to a large, object-oriented storage directory that stores data in its native format until it is used.
In certain embodiments, the storage layer (FIG. 1) of the database (2) comprises at least one transaction group storage (20), a block queue storage (21), a storage blockchain (22), and storing invalid blocks and a data lake (24).
In certain embodiments, the incoming transactions are stored in the storage of transaction groups (20), said transactions then being processed by the transaction group module (20), which extracts them and saves them in the form of blocks in the block queue (21). The blocks stored in the block queue (21) are then subject to block validation via consensus by federation. Validation is carried out using the block voting module. The block queue module (21) extracts the blocks that have been determined to be committed and forwards them to the blockchain storage (22). Blocks that have been determined to be invalid are passed to the storage of invalid blocks. In the latter case, all the transactions of said storage of invalid blocks are copied, using the module for depositing invalid blocks (23), to the group of transactions, to have a second chance. In some embodiments, at least a portion of the incoming transactions are copied to the data lake (24).
In certain embodiments, the storage management module manages the incoming transactions. Indeed, the execution of the storage management module on the processor allows the transaction group storage (20) at least to receive the raw transactions, to verify the transactions using a deep validation algorithm, d assigning the incoming transactions to a DTP agent by means of a selection strategy (randomized for the distribution of the workload, for example), and adding said verified transactions to the storage of the transaction group (20). Said verified transactions are indicated "transactions", with strong authorization (strong clearance).
In certain embodiments, the deep validation algorithm (DV) makes it possible to subsequently validate the arbitrary data by all the architectural layers of the data traceability platform, thereby validating the data possibly nested in depth, during that validation logic is provided through "validators" who have performed validation in a semantically encapsulated form (such as cryptographic validity checks, consistency of the data model) optimized for simultaneous execution (through hardware with multiple cores / multiple CPUs). Deep validation is used to validate any data from the traceability platform, and validators can be used and extended by individual applications running "above" the platform, thereby serving as a general purpose validation routine .
In certain embodiments, the transaction group module (20) identifies the verified transactions or the transactions with a large margin using the assigned DTP agent which is responsible for the execution of the transaction group module, creates blocks using transaction authorization, transmits the blocks to be stored in the block queue (21), and removes used transactions from the transaction group storage (20).
In certain embodiments, the execution of the voting module makes it possible at least to check whether the “ID” of an agent (identity) is in a list of voters stored in the storage layer of the database, to execute an in-depth validation process to determine the validity of a block, to add a vote to the block corresponding to the validation result, and to trigger or not a timer for each missing vote.
The storage layer also includes at least:
• a list of agents and services that exist on the platform. Said agents and services have been created and stored in a block which has been validated. Said list takes into account all the states created, deactivated and deleted.
• a cache memory (BBC) of blockchain blocks (22), which is a cache memory of a certain number of blocks for which a decision has been made (stored in the blockchain (22) or in the repository invalid blocks (23)) which are kept in memory for performance reasons. Said cache memory is materialized as a queue limited in size (configuration parameter) which uses underlying HashMaps for an efficient search for all the block IDs (block identity), transaction IDs, etc.
• a consensus block cache memory (CBC), which is analogous to the block chain cache memory (22) and which is used for blocks for which no decision has been made.
• a voting list, kept by each agent, which includes the blocks for which a vote is required by the corresponding agent, that is to say when the agent ID has been added to the list of " voting agents ”of the block (ID of the agents who must provide a vote), but that no vote has yet been provided;
• a transaction group cache (PTC), which stores the transactions originating from the transaction group (20).
• a list of transaction authorizations (TLC). Said list contains at least the following information in order to decide whether a transaction can be added to a block intended to be stored:
- a previous transaction (if any) is stored in a validated block
- an agent issuing a transaction is not blacklisted
- all the data (DLE, agent, etc.) are available in the system in order to verify them.
The traceability layer comprises at least one “agents” module intended for storing the traceability agent data model, a data object (DO) module intended for storing the data model of traceability data objects (DO) ), a “service” module intended to store the data model of the traceability service, and a “transactions” module intended to store the transactions of the traceability layer.
All entities provide the hash value of the data model as a unique ID in the "ID" field. This is particularly sufficient for data objects and transactions of traceability entities, which are always unique in the system (upgrades to a data object become a separate database entry). In some embodiments, the traceability entities are also identified by a “static ID” which does not change during an update (as in the case of a change of the e-mail address of the owner of an agent) and are the following :
• Agent owner (AgentOwner): "agent_owner_id" (unique ID) • Agent: "agent_id" (public key) • Service: "service_id" (unique ID) + "service_version"
The aforementioned "IDs" (hash of the data model) of the entities in the traceability layer (Agent owner, agent, service, etc.) are used for the identification of a specific database entry. Updating the "Agent owner", "Agent" and "Service" entities may cause a problem at first glance, since the hash changes. Consequently, a "static ID", a universal unique identifier (UUID) or a public key is used as another unique identifier which does not change when the data model is updated (such as the telephone number of the agent owner).
The term "UUID" refers to a 128-bit digit used to uniquely identify a certain object or entity on the Internet.
In some embodiments, the “Agent owner” entity includes at least one of the following attributes (see Figure 9): ID, dm_version, digest, agent_owner_id, asset_index, payload, first_name, last_name, date of birth, address, phone number, email, photo, comment, data_other, data_dle, dle_count, country, city, sblock_id, stransaction_id, stransaction_op, stimestamp, while the attributes sblockid, stransaction_id, stransaction_op and stimestamp are linked to contained transactions in the storage layer.
In certain embodiments, the “Agent” entity comprises at least one of the following attributes (see FIG. 10): ID, dm_version, agent_id, asset_index, payload, agent_version, agent_owner, agent_type, agent_name, agent_picture, comment, data_other, data_dle, dle_count, sblockid, stransaction_id, stransaction_op, stimestamp, while the attributes sblockid, stransaction_id, stransaction_op and stimestamp are linked to the transactions contained in the storage layer.
In certain embodiments, the “Service” entity comprises at least one of the following attributes (see FIG. 11): ID, dm_version, asset_index, payload, service_id, service_version, provider_agent, description, result_description, result_type, input_param_descr, input_param_types, "output_do", used_service_id, used_service_version, used_service_op, api_specification, is_traceable, comment, data_other, data_dle, dle_count, sblock_id, stransaction_id, stransaction_op, stim_amp, input_parid_ input_param_ transactions contained in the storage layer.
In some embodiments, the transactions contained in the traceability layer include at least one of the following attributes (see Figure 12): ID, dm_version, asset_index, payload, provider_agent, consumer_agent, service_id, service_version, service_op, input_param, output_do, resuit, comment, data_other, data_dle, dle_count, sblockid, stransaction_id, stransaction_op, stimestamp, while the attributes sblockid, stransaction_id, stransaction_op and stimestamp are linked to the transactions contained in the storage layer.
The storage layer comprises at least one set of modules which allow data storage and validation. To perform these operations, i.e. the storage and validation of data, at least one set of requests is necessary. In order to achieve high performance for data storage and associated queries, important data is kept in memory, by running some type of sliding windows on the data that is going through the validation process. Thus, only data validation is important for the data model.
In some embodiments, for high performance, a data model that targets block-based validation may be considered. This means that requests for information linked to a block must be dealt with efficiently. At least one of the following main requests is taken into account:
• Obtaining blocks by ID;
• Obtaining the block creation time;
• Obtain storage transactions by block ID;
• Obtaining status information by block ID;
• Obtaining votes by block ID;
• Obtain DLE by storage transaction ID
In some embodiments, at least one of the following requests is taken into account for the traceability of the data in the traceability layer, which are carried out as database requests with the corresponding information:
• Obtaining the transaction to be traced by data object ID;
• Obtaining the transaction to be traced by service ID;
• Obtaining agents by transaction ID to trace;
• Obtaining services by transaction ID to trace • Obtaining agents by agent owner ID;
• Obtaining services by service ID.
The requests on the traceability layer are not urgent (critical in terms of time).
In some embodiments, the voting module is characterized by high performance and configurability, since the voting can be adjusted according to a reduced federation size, such as that of trusted DTP agents of the provider. the data traceability platform, with a reduced voting system time and reduced storage latencies, and given that voting can store instantly and does not, because of its architectural design, require a minimum storage period as other systems, and more particularly those which use a "proof of functioning" consensus (like Bitcoin, with its block mining). In addition, the storage layer technology (Apache Cassandra database), as well as the design of the voting module is scaled (evolves), can be scaled (evolve) by its design, which provides a high total storage throughput.
In certain embodiments, the parametrability functionality of the distributed technical infrastructure is provided by a program and configuration files specific to the layers, each layer of said infrastructure being able to provide individual configurations. Indeed, the distributed infrastructure comprises at least one set of tools which make it possible to define the architectural configuration layers of the platform (1) without modifying any program stored in the memory of said platform. Thus, the optimal layers can be chosen or replaced by other layers. In some embodiments, the storage is separated into a storage infrastructure (database (2)) and storage management (block chain (22)) which have complete and independent configurations, which contributes to the large configuration capacity (both for the architecture of the platform (1) and the configuration specific to the layer).
In some embodiments, the data provisioning service (DPS) keeps proof of activity (data provision, access to data) transparent to the user.
In some embodiments, the data provisioning service (DPS) maintains the data provisioning activity, which requires indicating related metadata, such as references to the data sets used by others.
In some embodiments, the data provisioning service (DPS) keeps the complete history of a data set by analyzing the evidence and following the references within the metadata in order to allow the monitoring of the complete history. a set of data, and to identify all the users who can contribute to its creation. This is done by analyzing the data models stored in the storage layer, following their correlated information (for example, the traceability transaction stores information on the consumption of a service, with the agent of the supplier / consumer, the service used, the input / output data model, etc.) and by creating a graphical data structure which represents the traceability of the data of all the entities of the platform (data object, service, agent, traceability transaction, etc.).
In some embodiments, the data provisioning service (DPS) extends the functionality (characteristic) of pure ownership with operations (creation, update, transfer, deletion, deactivation, activation) intended to maintain the data sets on the platform.
In some embodiments, the platform (1) receives a request from a complex data analysis service which uses machine learning (ML) models and sets of training data available on the platform (1) in order to offer a machine learning analysis service.
In some embodiments, the system includes:
• a block chain (22); and • a computation resource provided for executing a loop so that the execution of the loop is influenced by the state of the block chain (22), the loop being implemented using a script, in order to keep a count of one or more votes for block validation linked to consensus, block validity or block validity decisions, and a corresponding block status generated for or associated with the block extracted from the queue waiting for blocks (21); and a set of invalidity conditions defined by the federation to determine the validity of the blocks (and validated by each agent participating in a vote) are evaluated and at least one action is taken based on the result of the evaluation; said action comprising at least:
- have at least one transaction written to the volume of the blockchain (22) of the database (2); and or
- make an action outside the block chain (22) be executed.
In some embodiments, for each iteration of the loop, a cryptographic hash (hash value) of the script is generated, and information relating to at least one iteration of the loop is stored in a blockchain transaction (22 ); preferably, the information is stored in the form of metadata in the transaction.
In some embodiments, the conditions relate to the data received, and the changes of property (by the block chain storage operation "TRANSFER", with c_owner defining the new owner) detected or generated by the computing resource; (or the state of the blockchain (22)).
In some embodiments, the database (2) provides a means of identifying the processes related to the storage of the transaction group (20), the block queue (21), the block chain ( 22) and the deposit of invalid blocks (23), block management, agents, data objects, services and transaction tracking. More specifically, all data models provide information about the issuing agent and are cryptographically protected by means of signatures, the accuracy of which can be proven with the public key of the agent concerned, which is publicly known in its data model in the agent database table, making all interactions with the database not only traceable, but also cryptographically secure. The database itself is strictly “append-only” for reasons of traceability. For example, updates are made as an additional entry to the database (with reference to the previous state), and deletion from the database is not even implemented, but is detected (in case of dishonest behavior) due to cryptographic relationships.
Agents are the only active entities on the platform (1) (controlled by human users or software logic) who work on behalf of an agent owner, the human user (4) being legally responsible for actions of agents he owns. The validity of the blocks is determined by checking a set of “invalidity conditions” (any state of invalidity generates a block invalidity vote), and a configurable threshold of valid votes must be reached, after which each voting is cryptographically secure with a signature from the issuing agent.
In certain embodiments, the database storage and blockchain consensus processes (22) each constitute a layer of the architecture which can be deleted, replaced or extended with other characteristics, such as for example. a data value estimation layer.
In certain embodiments, the data traceability platform (1) comprises at least one estimation module comprising at least one set of models and programs intended to estimate the contribution of different sets of data for the creation of a new set and, thus, to estimate a fair remuneration scale for the reuse of the data set.
In some embodiments, the set of models and programs includes at least one Shapley value estimation model. Shapley's value can estimate the value of a dataset and / or the relative contribution of input datasets to the value of the resulting dataset. For example, and without limitation, a set of original data can be shared on the platform (1) by a first user. Said data set can be modified by a second user (4, FIG. 2c) (data consumer) or a third user (4) and reused on the platform. The implementation of the traceability tools of the platform (1) of the present invention can make it possible to follow the shared source of the data sets, and each of their transformations. Said information is thus entered in the form of an entry in the estimation module, which in turn delivers the contribution of each user (4) to the new data set (the data set which results from all the transformations) , see Figure 2d.
Said results can be used for the remuneration of each user (4) who participated in the creation of the new data set according to the amount of the participation.
The present application describes various technical characteristics and advantages with reference to the figures and / or to various embodiments. Those skilled in the art will understand that the technical characteristics of a given embodiment can in fact be combined with characteristics of another embodiment unless the reverse is explicitly mentioned or it is obvious that these characteristics are incompatible or that the combination does not provide a solution to at least one of the technical problems mentioned in the present application. In addition, the technical characteristics described in a given embodiment can be isolated from the other characteristics of this mode unless the reverse is explicitly mentioned.
It should be obvious to those skilled in the art that the present invention allows embodiments in many other specific forms without departing from the scope of the invention as claimed. Consequently, the present embodiments should be considered by way of illustration, but may be modified in the field defined by the protection requested, and the invention should not be limited to the details given above.
权利要求:
Claims (39)
[1" id="c-fr-0001]
1. Distributed technical infrastructure intended for an agent-based data traceability platform system (1) (DTP) comprising at least one metadata database server (2) with layered architecture and computing resources, a data quality monitoring server, a management server for a data provisioning service (DPS), and an automatic querying machine, characterized in that the database (2) is used to store data and maintain secure evidence of data interchange, data ownership and transfer of ownership of data in data objects (DO) to allow a user (4) to access data sets ( DOi .... DO n ) provided by other users, and to propose a dataset accessible by another user (4), and in which the data traceability platform is configured to follow man secure ownership of a data set (DO) to be stored in the database (2) and to set up a consensus process by federation (FC) by adapting blockchain technology (22), a voting module deciding on the validity of a block in order to store the block in the database (2) during said process.
[2" id="c-fr-0002]
2. Distributed technical infrastructure according to claim 1, characterized in that the voting module of each agent of the data traceability platform of a "DTP federation", during the consensus by federation (FC), offers a vote individual on the validity, and the cumulative total of votes determines the validity of the block, at least the size and the participants in the DTP federation and the validity threshold being configurable.
[3" id="c-fr-0003]
3. Distributed technical infrastructure according to claim 1, characterized in that the blockchain technology (22) is adapted to include a forward-looking model allowing a service provider to provide calculation-oriented information, by storing at least the assertions of 'an access and control (AC) approach within the framework of a transaction, thus containing purely forward-looking information on the calculation which specifies the actual logic executed.
[4" id="c-fr-0004]
4. Distributed technical infrastructure according to claims 1 to 3, characterized in that the blockchain technology (22) is configured, via the data storage and logging functionality in order to maintain the processing information, to understand a model retrospective.
[5" id="c-fr-0005]
5. Distributed technical infrastructure according to claims 1 to 4, characterized in that the blockchain technology (22) is configured to include a mixed pro / retrospective model with the functionality of a distributed process management system, the combination of pro and retrospective information enabling the calculation to be monitored and the reliability of the calculation result to be assessed.
[6" id="c-fr-0006]
6. Distributed technical infrastructure according to claim 1, characterized in that it comprises at least one public key infrastructure (PKI) which allows the use of asymmetric cryptographic tools to secure the transactions of a set of data or objects. (DO) and track ownership of a data set.
[7" id="c-fr-0007]
7. Distributed technical infrastructure according to claim 6, characterized in that a unique public and private key is generated for the creation of an agent account, in order to allow said agent to use at least one asymmetric cryptographic tool of the public key infrastructure.
[8" id="c-fr-0008]
8. Distributed technical infrastructure according to claim 7, characterized in that said unique public and private key also allows said agent to act as an active DTP entity, which is active in a machine which executes a DTP instance and which is responsible for actions the instance using the digital signatures of all communications that use the private key.
[9" id="c-fr-0009]
9. Distributed technical infrastructure according to claim 1, characterized in that it comprises at least one access and control system (AC) which maintains access to available services and / or which controls the creation of fraudulent services within of the platform, while the access rights are specified by each service provider (3) which creates a proof of access right for the user concerned using the data management functionalities of the chain blocks, said AC system allowing inter- and intra-service traceability if appropriate assertions are provided by a service provider (3).
[10" id="c-fr-0010]
10. Distributed technical infrastructure according to claim 9, characterized in that the service provider (3) specifies, for a consumable digital service, using an interactive interface based on a web browser, source metadata which allow inter-service and intra-service traceability, and at least details about the calculation performed and the planned output data objects (DO).
[11" id="c-fr-0011]
11. Distributed technical infrastructure according to claim 1, characterized in that the database (2) is designed to manage the high workloads and the functionalities in parallel, said database being distributed with linear correction performances, optimized for write operations.
[12" id="c-fr-0012]
12. Distributed technical infrastructure according to claim 1, characterized in that the database (2) is Apache Cassandra.
[13" id="c-fr-0013]
13. Distributed technical infrastructure according to claim 1 or 11 or 12, characterized in that the database (2) comprises at least one storage layer comprising at least one transaction group storage (20), a queue storage d block wait (21), block chain storage (22), invalid block storage, and data lake (24).
[14" id="c-fr-0014]
14. Distributed technical infrastructure according to claims 1 and 13, characterized in that said storage layer comprises at least one set of modules: a transaction group module (20) intended to process the transactions which enter the group storage of transactions (20) and will be stored in the blocks of the block queue (21), a block queue module (21) intended to process the blocks which are not yet validated and which are submitted by consensus for the validation of distributed blocks, a block chain module (22) intended to process the validated blocks, a module for depositing invalid blocks (23) intended to treat the blocks which have been validated as being invalid during the validation of blocks by consensus by a block voting module, a storage management module intended to store and process additional entities which are part of the blocks and transactions, in particular the votes s which are required for block voting, block statuses used to persist the result of a validity vote of blocks and data lake elements (DLE) (24) extracted to improve performance due to their variable data size.
[15" id="c-fr-0015]
15. Distributed technical infrastructure according to claim 14, characterized in that the execution of the storage management module allows the storage of transaction groups (20) at least to receive the incoming raw transactions, to verify the transactions using a deep validation algorithm, assigning incoming transactions to a DTP agent through a selection strategy, and adding said verified transactions to the storage of transaction groups (20).
[16" id="c-fr-0016]
16. Distributed technical infrastructure according to claim 15, characterized in that the deep validation algorithm (DV) makes it possible to subsequently validate arbitrary data by all the architectural DTP layers, thereby validating the data possibly nested in depth, during that validation logic is provided through "validators" who have performed validation in a semantically encapsulated form optimized for simultaneous execution.
[17" id="c-fr-0017]
17. Distributed technical infrastructure according to claims 14 and 15, characterized in that the transaction group module (20) identifies the verified transactions using the assigned DTP agent which is responsible for the execution of the group module , creates blocks using transaction authorization, passes blocks to be stored in the block queue (21), and removes used transactions from the storage of transaction groups (20).
[18" id="c-fr-0018]
18. Technical infrastructure distributed according to claims 1 to 17, characterized in that the execution of the voting module at least makes it possible to verify whether the "ID" (identity) of an agent is in a list of voters stored in the storage layer of the database, to execute an in-depth validation process to determine the validity of a block, to add a vote to the block corresponding to the result of the validation, and to trigger or not a timer for each missing vote.
[19" id="c-fr-0019]
19. Technical infrastructure distributed according to claims 1 and 13, characterized in that the storage layer comprises at least:
• a list of agents and services that exist on the platform. Said agents and services have been created and stored in a block which has been validated. Said list takes into account all the states created, deactivated and deleted.
• a cache memory (BBC) of blockchain blocks (22), which is a cache memory of a certain number of blocks, for which a decision has been made (stored in the blockchain (22) or in the deposit of invalid blocks (23)), which are kept in memory for performance reasons. Said cache memory is materialized as a queue limited in size (configuration parameter) which uses underlying HashMaps for an efficient search for at least all the block IDs (block identities), transaction IDs • a consensus block cache (CBC), which is analogous to the block chain cache (22) and is used for blocks for which no decision has been made.
• a voting list, kept by each agent, which includes the blocks for which a vote is required by the corresponding agent, that is to say when the agent ID has been added to the list of " voting agents ”of the block (ID of the agents who must provide a vote), but that no vote has yet been provided;
• a transaction group cache (PTC), which stores the transactions originating from the transaction group (20).
• a list of transaction authorizations (TLC). Said list contains at least the following information in order to decide whether a transaction can be added to a block intended to be stored:
a previous transaction (if any) is stored in a committed block
- an agent issuing a transaction is not blacklisted; all the data (DLE, agent, etc.) are available in the system in order to verify them.
[20" id="c-fr-0020]
20. Technical infrastructure distributed according to claims 1 and 13, characterized in that the traceability layer comprises at least one “agents” module intended for storing the traceability agent data model, a module of data objects (DO ) intended to store the data model of the traceability data objects (DO), a “service” module intended to store the data model of the traceability service, and a “transactions” module intended to store the transactions of the layer of traceability.
[21" id="c-fr-0021]
21. Distributed technical infrastructure according to claim 20, characterized in that the traceability entities are also identified by a "static ID" which does not change during an update and comprises at least:
5 · Agent owner (AgentOwner):
"Agent_owner_id" (unique ID) • Agent: "agent_id" (public key) • Service: "service_id" (unique ID) + "service_version"
10
[22" id="c-fr-0022]
22. Distributed technical infrastructure according to claim 19, characterized in that, in the storage layer, at least one of the following requests is taken into account to guarantee the efficient processing of information linked to a block:
• Obtaining blocks by ID;
• Obtaining the block creation time;
• Obtain storage transactions by block ID;
• Obtaining status information by block ID;
• Obtaining votes by block ID;
• Obtain DLE by storage transaction ID
20
[23" id="c-fr-0023]
23. Technical infrastructure distributed according to claims 20 and
21, characterized in that the following requests are taken into account for the traceability of the data in the traceability layer, which are carried out as database requests with the corresponding information:
25 · Obtaining the transaction to be traced by data object ID;
• Obtaining the transaction to be traced by service ID;
• Obtaining agents by transaction ID to trace;
• Obtaining services by transaction ID to be traced;
• Obtaining agents by agent owner ID;
• Obtaining services by service ID.
[24" id="c-fr-0024]
24. Distributed technical infrastructure according to claim 1, characterized in that the voting module, having high performance and parametrability functionality, allows:
• to adjust the vote according to a reduced federation size, with a reduced voting system time and reduced storage latencies, • to instantly store the vote without the architectural design requiring a minimum storage period.
[25" id="c-fr-0025]
25. Distributed technical infrastructure according to claim 24, characterized in that the database storage layer technology is configured to scale its architectural design and obtain a high total storage throughput.
[26" id="c-fr-0026]
26. Distributed technical infrastructure according to claims 1 and 24, characterized in that the parametrizability functionality of the distributed technical infrastructure is ensured by a program and configuration files specific to the layers, each layer of said infrastructure being capable of providing individual configurations.
[27" id="c-fr-0027]
27. Distributed technical infrastructure according to claim 1, characterized in that the data provisioning service (DPS) keeps proof of activity in a transparent manner for the user.
[28" id="c-fr-0028]
28. Technical infrastructure distributed according to claims 1 and 28, characterized in that the data supply service (DPS) maintains the data supply activity, which requires indicating the related metadata as references to the data sets used by others.
[29" id="c-fr-0029]
29. Distributed technical infrastructure according to claim 1, characterized in that the data supply service (DPS) keeps the complete history of a data set by analyzing the evidence and following the references within the metadata in order to allow the monitoring of the complete history of a data set, and to identify all the users who can contribute to its creation.
[30" id="c-fr-0030]
30. Distributed technical infrastructure according to claim 1, characterized in that the data provisioning service (DPS) is configured to extend the pure ownership functionality via data set maintenance operations on the platform.
[31" id="c-fr-0031]
31. Distributed technical infrastructure according to claim 1, characterized in that the platform (1) is configured to receive a request from a complex data analysis service, said analysis service using models of machine learning (ML) and learning data sets available on the platform (1) in order to offer a machine learning analysis service.
[32" id="c-fr-0032]
32. Distributed technical infrastructure according to claim 5, characterized in that the system comprises:
• a block chain (22); and • a computation resource provided for executing a loop so that the execution of the loop is influenced by the state of the block chain (22), the loop being implemented using a script, in order to keep a count of one or more votes for block validation linked to consensus, block validity or block validity decisions, and a corresponding block status generated for or associated with the block extracted from the queue waiting for blocks (21); and a set of invalidity conditions defined by the federation to determine the validity of the blocks, and validated by each agent who takes part in a vote are evaluated and at least one action is taken based on the result of the evaluation; said action comprising at least:
- have at least one transaction written to the block chain volume (22) of the database (2); and or
- have an off-block chain action executed.
[33" id="c-fr-0033]
33. Distributed technical infrastructure according to claim 1 to 31, characterized in that, for each iteration of the loop, a cryptographic hash of the script is generated, and the information relating to at least one iteration of the loop is stored in a transaction on the blockchain (22), the information being stored as metadata in the transaction.
[34" id="c-fr-0034]
34. Distributed technical infrastructure according to claim 1 to 32, characterized in that the conditions relate to the data received, and the changes of property detected or generated by the computing resource; or the state of the blockchain (22).
[35" id="c-fr-0035]
35. Distributed technical infrastructure according to claims 1 to 33, characterized in that the database (2) offers a means of identifying the processes related to the storage of the transaction group (20), to the block queue (21), blockchain (22) and invalid block repository (23), block management, agents, data objects, services and transaction traceability.
[36" id="c-fr-0036]
36. Distributed technical infrastructure according to claim 1 to 34, characterized in that the database storage (2) and blockchain consensus (22) processes each constitute a layer of the architecture which can be deleted. , replaced or extended with other characteristics.
[37" id="c-fr-0037]
37. Distributed technical infrastructure according to claim 1, characterized in that the data traceability platform comprises at least one estimation module, said module comprising at least a set of models and programs intended to estimate the contribution of different data sets for the creation of a new set and, thus, to estimate a fair remuneration scale for the reuse of the data set.
[38" id="c-fr-0038]
38. Distributed technical infrastructure according to claim 37, characterized in that the set of models and programs of the module
5 estimation comprises at least one Shapley value estimation model, said Shapley value estimating the value of a data set and / or the relative contribution of input data sets to the value of the resulting dataset.
[39" id="c-fr-0039]
39. Technical infrastructure distributed according to claims 37 and
10 38, characterized in that the estimation module is used for the remuneration of each user (4) who participated in the creation of the new data set according to the amount of the participation.
类似技术:
公开号 | 公开日 | 专利标题
Wilkinson et al.2014|Metadisk a blockchain-based decentralized file storage application
US9229997B1|2016-01-05|Embeddable cloud analytics
IL266731D0|2019-07-31|System and method for interaction object reconciliation in a public ledger blockchain environment
WO2019106186A1|2019-06-06|Secure data tracking platform
US20180357683A1|2018-12-13|Rating data management
US20200252205A1|2020-08-06|Systems, methods, and apparatuses for implementing a multi tenant blockchain platform for managing einstein platform decisions using distributed ledger technology |
US20200162266A1|2020-05-21|Facilitating analytic services for provenance of digital documents
US20200250747A1|2020-08-06|Systems, methods, and apparatuses for dynamically assigning nodes to a group within blockchains based on transaction type and node intelligence using distributed ledger technology |
US20200026710A1|2020-01-23|Systems and methods for data storage and processing
US10812551B1|2020-10-20|Dynamic detection of data correlations based on realtime data
WO2020053647A2|2020-03-19|System, method, and apparatus for online content platform and related cryptocurrency
WO2020139827A1|2020-07-02|System and method for providing a graph protocol for forming a decentralized and distributed graph database
Tarekegn et al.2016|Big data: security issues, challenges and future scope
US10972479B2|2021-04-06|Task completion using a blockchain network
US20200236168A1|2020-07-23|Decentralized data flow valuation and deployment
Mehta et al.2019|Decentralised image sharing and copyright protection using blockchain and perceptual hashes
Shrestha et al.2020|A blockchain platform for user data sharing ensuring user control and incentives
US20200143242A1|2020-05-07|System and method for creating and providing crime intelligence based on crowdsourced information stored on a blockchain
Shrestha et al.2019|User data sharing frameworks: a blockchain-based incentive solution
US20220029785A1|2022-01-27|Traceability of edits to digital documents via distributed ledgers
US20200142965A1|2020-05-07|Migration of a legacy system
Samaniego et al.2019|Access control management for plant phenotyping using integrated blockchain
Nazir et al.2020|Cloud computing applications: a review
Umekwudo et al.2019|Blockchain technology for mobile applications recommendation systems
CN111815420A|2020-10-23|Matching method, device and equipment based on trusted asset data
同族专利:
公开号 | 公开日
WO2019106186A1|2019-06-06|
FR3074322B1|2021-04-16|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
WO2017109140A1|2015-12-22|2017-06-29|Bigchaindb Gmbh|Decentralized, tamper-resistant, asset-oriented database system and method of recording a transaction|
US20170323392A1|2016-05-05|2017-11-09|Lance Kasper|Consensus system for manipulation resistant digital record keeping|CN111737352A|2020-06-23|2020-10-02|四川长虹电器股份有限公司|Supply chain information collaborative management method based on block chain|
CN110855761B|2019-10-29|2021-09-21|深圳前海微众银行股份有限公司|Data processing method and device based on block chain system|
EP3913485A1|2020-05-20|2021-11-24|Cleverdist SA|Method and computing platform for controlling the sharing of data streams exchanged between multiple organisations|
法律状态:
2019-05-31| PLSC| Publication of the preliminary search report|Effective date: 20190531 |
2019-11-29| PLFP| Fee payment|Year of fee payment: 3 |
2020-11-26| PLFP| Fee payment|Year of fee payment: 4 |
2021-11-26| PLFP| Fee payment|Year of fee payment: 5 |
优先权:
申请号 | 申请日 | 专利标题
FR1761423A|FR3074322B1|2017-11-30|2017-11-30|SECURE DATA TRACEABILITY PLATFORM|
FR1761423|2017-11-30|FR1761423A| FR3074322B1|2017-11-30|2017-11-30|SECURE DATA TRACEABILITY PLATFORM|
PCT/EP2018/083221| WO2019106186A1|2017-11-30|2018-11-30|Secure data tracking platform|
[返回顶部]